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ABSTRACT 



REC/e is * CONVERT progran , realized in t\u C1SS LISP of 
Project MAC, for coitplJing REG expressions into the Fiflthin* language 
of the PDP-B computer. Sine* the cempiUtion consists in Its majority 
of subroutine calls (to bo compiled, after removal of LISP parentheses 
by MACPIV8) the technique ii applicable with trivial modification to 
mny other Computer having the subroutine jump and indirect trvilfer 
instructions. The purpose of the p g q gia ai is both to compile RFC 
expressions and to illustrate the workings of the PEC language, and 
accordingly a description of this language is pi von. It contains 
operators and predicates { flow of control is achieved by parentheses 
which <Jefine subexpressions, colon which implies iteration, and 
semicolon which terminates the execution of an expressiun. Predicates 
pass control to tho position following the next colon or sewd colon, 
allowing the execution of alternative expression string* ■ 
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REC (ftEGULAI? EXPRESSION' COMPILER.) is a programming language of 
simple structure developed originally For the PDP-8 computer, but 
readily adaptable to any other general purpose computer* It has been 
used extensively in teaching Algebra and Numerical Analysis in the 
ESFMj even for programming hand calculations with the Friden electronic 
desk calculator* In rather vague terms, it derives its appeal fror the 
fact that computers can be regarded in one way or another as Turing 
Machines with very elaborate built-in shortcuts to eliminate the 
grotesque inefficiency of manipulating individual bits on a single 
linear tape, A turing Machine consists of a finite state machine acting 
as the control of a tape memory; finite state eachines in turn are 
conveniently described by regular expressions. The REC notation is a 
manner of writing regular expressions mor arcnable to programming the 
Turing Machine which they control. If one does not wish to think so 
strictly in terns of Turing Machines. REC expressions still provide a means 
of defining the flow of control in a program* which is quite convenient 
in many applications. 

Let E be an alphabet, which presumably would not contain among 
its letters the operational signs which we shall introduce* Me then 
define a REC expression recursively in the following manner, 

i) X is a REC expression 

ii) {) is a REC expression 

iii) if o c lull : -} j0is a j^c expression 

iv) if a and e are PEC expressions, so is oB 

v) if & is a REC expression, so is (a) 
The operational signs are used as follows. Parentheses are used 
to denote a single expression* Concatenation is implied by writing 
expressions in sequence. Colon [.] implies iteration of ail the 
expression which precedes the colon* Semicolon [;] terminates the 
concatenation of a string* The large period [»] indicates a choice 
between continuing to concatenate the following expressions or to pass 
over then until the next following colon or semicolon (if any) of the 
same parenthesis level is reached. Such a choice is always implied 
following a parenthesited expression* 



It is to he noted that parentheses have a very technical use in 
REC expressions * and are more than single si^ns of n roup in p. Thus, since 
concatination is associative, it is always written in its extended form 
without parentheses. When some grouping is desired to he shown* some other 
syobol, such as squar* brackets, should be used. The non-associativity of 
REC parenthesixation is often exploited to achieve some economy or simplification 
of expression. 

To see the correspondence between regular expressions and HEC 
exprcssions f we first show how any regular expression is to be written 
as a R£C expression* 

* * P 

-v A -+• \ 

ff -+• tr 
*)3 •* a 

ai..j* (**:£.;) 

For the converse process of writing the regular expression 
corresponding to a REC expression M it is »ore convenient to show hew 
to use a REC expression to construct a transition system, whose regular 
expression (or class of equal regular expressions) may then be deduced. 
The algorithm is as follows, recursively defined. 

1) For every REC expression there Kill be an initial 

state and two final states* labelled T and F. 

2) A REC expression is to be read from left to right, but 

any quantity appearing within parentheses is to 
be treated as a single expression^ recursively* 
with additional rules governing how to join its 
initial and final states to those outside the 
parentheses, 

5) If o £ £ is seen p draw an arrow labelled o from the last 
state to a new state. 

4) If A is seen, do the saw with a spontaneous transition. 

5) if : is seen, draw an arrow representing a spontaneous 

transition back to the initial state* 

6) If ; is seen, draw an arrow representing a spontaneous 

transition to the final state T, 

7) H"heneveT either : or ; is seen, a new state should be fonned, 

8) If * is seen, draw an arrow representing a spontaneous 

transition to the state immediately following the next 
j or s, if any; otherwise to the final state T* 
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9) If a parenthesised expression is seen, apply the 

■ m entire algorithm to the parenthesized expression* 

When this is done, draw an arrow representing a 
* spontaneous transition from the last state to the 
initial state of the enclosed expression. The 
last stato written is the final state F of the 
enclosed expression t and is to bo connected by a 
spontaneous transition to a subsequent state in 
exactly the same manner as had an * been seen, 
The final state T is to be used as the current 
state in continuing to process the outeT level, 

10) The last state to be written is the final state F of 

the outer expression; or should be connected to it 
by a spontaneous transtion if the latter has 
already been prepared* The final state T is the 
accepting state of the transition system. 

As an example of the process, let us consider the transition 

systen which we would produce from the REC expression 

CR P. : Q* (R Q*;:) stfs) 
by following the above rules. 




It will be seen that the three F states are all isolated because 
according to this particular REC expression there is no way to arrive at 
any of them. 

It will be noted thst the REC expressions which are derived from 
regular expressions by the prescription we have offered forn a limited 
class among the possible REC expressions. In part this is due to a 
desire to leave the semantics of the REC expressions relatively weak, even 
thou C h it admits a great number of expressions which would produce useless 
transition diagrams} for instance we do not exclude the sequence ::;::. 



But there is also the consideration that although regular expressions 
are defined with binary connectives, such as union and concatination, 
those connectives are associative and are generally written in 
parenthesis free form* Although BBC parentheses are not associative, 
there nevertheless exist convenient n-ary foras equivalent to their 
corresponding binary forms. For example, in si triple union one could 
write 

(A u B) u C as { («A;B;);C;] 

A u (B u C) as (*A;(*B;C;)|) 
but preferable to both is writing 

A u D U C as (oA; B B;C) 
with a similar notation corresponding to a more extensive union. 
In fact, such merit as there might be to the REC notation arises from the 
fact that although it might be somewhat cumbersome to make direct 
transcriptions of regular expressions, there will be a consequential class 
of expressions which we will wish to write* whoso REC form will be simpler 
and more convenient than the corresponding regular expressions* Tims the 
correspondences which we have established serves to demonstrate that the 
totality of REC expressions is no more nor Jess general than the totality 
of regular expressions. 

Since the intention of REC expressions is to control the operation 
of a general purpose conputer (or more specifically a Turing Machine), we 
will expect the letters of the REC alphabet to represent individual 
operations of which the machine is capable. For this reason the letters 
will be called operators . Words of the REC alphabet will then correspond 
to sequences of operations, carried out in the order given. The transition 
system derived from a REC expression will then accept a word of this 
alphabet if it corresponds to a possible series of operations which could be 
carried out during the calculation in question. In the case of a Turing 
Machine, the operators will be to writ* a symbol, compare a symbol, nove the 
tape left or »ove the tape right. But the operators will have to be chosen 
according to the circumstances. 

In reality we are not so much interested in recognizing a possible 
calculation as in prescribing the particular one which we want a»cm E all 
those possible. It is for this purpose that the large period („] was 
introduced, which is related to the operation of union in a regular 
expression* At each place in a REC expression where * occurs, there 
is a spontaneous transition in the transition diagram* indicating the 
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possibility of A selection ainonp two alternatives; to continue the 
regular sequence t or to start a new one by following the spontaneous 
transition. To specify a particular word among all those represented 
by a given REC express! on f it is only necessary to specify this choice 
at each place where it becomes possible* We night even Assume that 
there are special operators whose purpose is to wake this choice*. They 
are called p redl gates t and will always combine the symbol « implicitly, 
Thus a predicate is a combination of an appropriate operator followed by 
the symbol a* We will moreovre say that a predicate takes the value 
true or false according to whether the decision is made to continue 
in the regular sequence or to follow the spontaneous transition pas** 
the nearest colon or semicolon* Every parenthesized expression is 
automatically assured to be a predicate, although analysis may show that 
it is only capable of assuming one of the two possible values. Such 
was the case in our example. 

The transition diagrams of RCC expressions have two final states 
to accomodate their usape as predicates. Thus a calculation definitely 
fails t definitely succeeds , or else is in progress. Moreover the REC 
notation has been particularly chosen to Facilitate the formation of 
Boolean combinations of its subexpressions. Thus the combination AND of 
the predicates a, b t c, *.i t n is written 

(abc ,,, n;), 
a notation which is valid for any number of arguments. Thus ( ;) always 
is a true predicate, whilst a * (a;). 

The combination OR of these same predicates would be written 
(a; b; c; >,»; n;) f 
which again holds for any nueber of arguments, (} is a predicate which 
is always false, and as before, (a;) ■ a, 

The complement of the predicate x is written 

We accordingly always have x « ((xj], 

A typical REC expression will begin with a series of operators, 
followed by a predicate which will decide typical questions such as 
whether the calculation is finished and be followed by ; # or whether to 
repeat the whole procedure and be followed by :* When these conditions 
fail, there will follow further calculation, expressed by a series of 
operators, and yet another predicate. One executes as much of a string as 



he can until he meets a delimiter, and ss many strings as necessary to 
Mset a terminal condition. One practical caution which has to be 
observed is that if several predicates occur in a string, and one has 
reached the end of the string, the AND of all these predicates is true. 
if one arrives beyond a colon or semicolon, indicating the string has 
failed, he only knows the AND has failed, but not which individual 
predicate. This requires either a new test of some of the predicates, or 
a aiore cautious rewriting of the REC expression. It is one situation in 
which one sometimes wishes there were a more direct control of the flow of 
control in a REC expression; perhaps by means of labels and "CO TO's;" 

To give some very simple examples of the application of REC, let us 
bear in mind the PDP-8 computer, which has a teletype coded for 64 ASUI 
characters in direct communication with the central processor. Let R 

be the operator which reads one such character, either from paper tape 

or punched by fcand on the keyboard, and W be the operator which sends one 

such character to the teletype. The characters are kept in a workspace 
(the accumulator, say) , and we nay imagine 64 operators of the type "x 

which place the character x in this workspace erasing the previous 

contents, as well as 64 predicates -x which test the workspace for 

equality to the character x. 
The KEC expression 

(R -I; W " Wt) 

will doublespace everything which it reads, until the exclamation point 

is encountered and it terminates operation. 

Let us say that we wish to ignore all text which occurs between 

two stars. An appropriate expression will be 

(R -I; -" (R -*;:) jw : ) 

and again it will terminate when an exclamation point is encountered in the 

printing text. 

By including operators for the binary conversion of decimal input 
and output, the arithmetic operations, and a tost for negative numbers, one 
could formulate REC expressions for arithmetic calculations. The domain of 
applicability of REC depends upon its complement of operators and predicates; 
however at present it is only the control structure which interests us. 

Although we are describing a compiler of REC for the PDP-8, the 
description is applicable to the majority of machines because the compilation 
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is «de entirely in terms of subroutine calls, except for the part which 
corresponds to BEC's own flow of control, which is realized for the most 
part by appropriate transfers. 

In the PDP-S, a subroutine call is made by means of the Instruction 
JMS {Jump to Subroutine), Let us suppose we have the coding configuration 
X, JMS Y 



Y, 00 

■ * « 
JHP I Y 

When the instruction JKS t t located at address X, is executed, the address 
X+l is stored at Y, and trans ver is made to Y+I. When the subroutine is 
terminated, this 1* done by the instruction JHP I Y, an indirect transfer 
to Y which is a transfer to X+i, so that the original program is resumed 
in sequence. 

Data of use to the subroutine Y may be located at addresses X+l, 
X+2, and so on, and may be accessed indirectly through the address stored 
at location Y. By applying the instruction ISZ Y, (Increment and skip on 
iero), this data nay be gathered item by item. Moreover, the subroutine 
Y can serve as a predicate, since an ISZ preceding the return jump can 
cause a skip to X+B rather than a return to X+l. 

In this way, the predicate, *x, may be treated as a composite 
predicate, forwtd from a general subroutine EQ, which uses the character 
X as a parameter in the calling sequence* «x would then compile into 

JMS EQ 

a 

(return false] 

(return true) 
Clearly, this pattern accounts for predicates with multiple parameters, 
including none/ the false return will contain a transfer, corresponding to 
the spontaneous transition of the transition diagram which the REC expression 
defines, while for the troe return there will occur further subroutine jumps 
corresponding to subsequent operators* 

With these preliminaries we may now turn to the CCNVEHT program 
REG, an annotated listing of which we give below. 



DEFINE CC 

(HE (LAMBDA (L) (PRINTLIST (CONVERT 
(QUOTE ( 

or PAV (»OR* R W) 

>\'» fc th& ^uajKWfl of thf$ pivyiWlj thw cl%uuuw P/ bttoiu tuv divtinctdehed: 

Operat&ra (0V) 3 Predicates (PR) atul coryound pwdi-Mtea (CD* In a&en 
category its members are listed, and treated as PAV*s by the CONVERT program* 

PR PAV (-OR- Q) 

CP PAV (-OR- EQ QU) 

)) 

(QUOTE ( 

X (XXX) 

)) 

L 

[QUOTE (*0 ( 

(PR ((JMS PR) (JHP FA])] 

Predicates are compiled as a subroutine call foliated by a transfer to FA* 
FA is the heading corresponding to the FALSE exit of the segment under 
compilation, this transfer is skipped over when the predicate is true, 

COP ((JMS OP])) 

Operators are compiled by a simple subroutine call* 

CCCP X)' ((JMS CP} (X) (JNP FA))) 

Compound Predicates are a&Bptt&d ae Fredt-aatee, but their parameter is 
included ae part of the calling sequence, 

((•*) ((JMP OP) FA)) 

The CONVERT program is written in such a way that it dose not distinguish 
CDR of a Hat from a list* Rouever, these have to be processed differently 3 
and are therefore distinguished by a double asterisk placed in front of 
a fragment tikiah has arisen ae CDR of a list* When only the double 
asterisk is left, the end of the list has been reached, the spontaneous 
transition (JMP OF) corresponding to the fact that each parenthesized REC 
expression is regarded as a predicate ie inserted, and the heading FA is 
placed, since we have n&J arrived at the first etate outside the parenthesis 
to uhich all false exits in the last segmsnt must precede* 

((** CO XXX) ((JHP HE) FA (*SKEL* FA EXPR -fiNST* ("REPT* (** XXX))))) 
When, in examining a REC expression element by element, we arrive at a colon 
fCTSS DOES NOT LET US WHITE ALL CHARACTERS, AN IDIQS7NCRACY OF THE LISP IffPUT 
ROUTINE), hte write a spontaneous transition to the initial state {JMP HE), 
note the false exit point of all predicates in the previous segvnent, and define 
a neu false exit point for the ensuing segment* The analysis continues vith the 
remainder of the REC expression, 4 * serving as a signal that we do not deal with 
a new expression*. 

£(** SC XXX) t(JHP TR) FA £*SKEL* FA EXPR "GNSY- (-REPT- (** XXX)))}) 
When a semicolon is encountered, a spontaneous transition is made to the TRUE 
final state (JMP TR) 9 the false exit point of alt predicates in the previous 
segment is noted, <md a «e?J false exit point is established for the ensuing 
segment. The analysis then precedes with the reminder of the expression* 

C(** X XXX) ((*REPT* X) (*REPT* (** XXX)))) 
If neither delimiter is encountered, we compile the CAR and then the CDR of the 
expression. Car's and CDR's ars not treated uniformly because a nev initial 
state has to he established for each subexpression, but not for each CDR, 

((— ) C-SKEL- HE EXPR -GNSYt. TR •EXPK'-GNSY* OF EXPR FA 

(HE (*SKEL* FA EXPR *GNSY- (-REPT- (** *5AME*))) TR))) 



i In compiling a parenthesized expressly provision must be made for the 
urftfjtf state, TRUE final state t and FALSE final state t all of which are 

Wined w WB U **J Thu * lab * U mat be inetucbd at appropriate points 
in the oomvxled code. ^ l v y 

»)) 

S^SLHS 1 ? (X) CPR0G CY) CSET ^ Y *> < CL0CK O) A (PRINT (CAR Y)) 
(SETQ Y (CDR Y)} (COND ((NULL Y) (RETURN (CLOCK T)))) (GO A)))) 

PRINTLIST.U an auxiliary function which allow listing the compiled progivtm 
uMhone PDP-3 instruction per tin*, rather than as a compact list in the 
usual manner that LISP would print a result. 



As an examplci of the operation of REC we may consider the following 
example. (REC L) is a function whose argument is. the REC expression which 
is to be compiled. On account of inherent limitations in the orthography 
of the CTSS LISP input routine, certain substitutions had to be made: 
SC for :. CO for i, (EQ X) for -X, (QU X) for "X. 



1 1 



rec (O (eq -) co (eq :) scwrqwrqwrq w)) 

C03163 

(JMS R) 

(JMS EQ) — 

parameter 

false 



Initial Point 
R 



(JMP G03165) 
(JMP COS 163) 
C0316S 
(JMS EQ) 

CO 

(JMP G03166) 
(JMP G03164) 
G031&6 
(JMS tf) 

(JMS R> 
(JMS q) 
(JMP G03167) 
(JMS W) 
(JMS R) 
(JMS Q) 
(JMP G03167) 
(JUS W) 
(JMS R) 
(JMS Q) 
(JMP 003167) 
(JMS W) 
(JMP FA) 
GQ3167 
G03164 



false exit of last segment 

=■• 

parameter 
false 

falsa exit of last segment 

R 

Q id an arbitrary pr&dicate 

false exit 

St 

S 

Q 

f 

s 

w 

exit from last segment to FALSE final state 

SE!?!??"^ m higker UVel ' e3dt °f F ' B in &M* segment 
TRUE final state, exit of all semicolons 

(time of execution) 
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The program which is generated is incomplete in the sense that it 
itself should be finished off as a subroutine, with a Manfc entry point 
bearing an appropriate label, and terminated with appropriate ISZ's and 
.IMP 1*5, 

For case of reference we conclude with an unannetated listing of the 
program 



DEFINE (( 



(REC (LAM3DA 
(QUOTE ( 

OP 
PP 
CP 

)) 

(QUOTE ( 

X (XXX) 
)) 

L 

(QUOTE (*u ( 
(PR 
(OP 
(CCP X) 

((**) 

((•* CO 
£(** SC 

«** x 

((— ) 

))} 
)))) 



(L) (PRINTLIST (CONVERT 



PAV 
PAV 

PAV 



(•OR* R W) 

(■OR- Q) 
(*OR» EQ QU) 



XXX) 
XXX) 
XXX) 



((JMS PR) (.TMP FA))) 

({JMS OP))) 

{(JMS CP) (X) (JMP FA))) 

((JHP 0.-) FA)) 

((JMP JE) FA CSKEL* FA EXPR »GNSY- (-REPT- (** XXX))))) 

((JMPTR) PA (*SKEL* FA EXPR -GNSY. (-REPT- (** XXX))))) 

(CREPT* X) CREPT* (** XXX)))) 

(•5KEL* HE EXPR -CNSY- TR HXPR ■GNSY- OF EXPR FA 

(HE (*SKEL* FA EXPR *CKSY- (-REPT* (** +SAHE*))) TR))) 



(PRINTLIST (LAMBDA (X) (PROG (Y) (SETQ Y X) (CLOCK ()) A (PRINT (CAP Y)) 
(SETQ Y {Cm Y)) (COND ((NULL Y) (RETURN (CLOCK T)))) (GO A)))) 
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